Arbitrary style transfer (AST) transfers arbitrary artistic styles onto content images. Despite the recent rapid progress, existing AST methods are either incapable or too slow to run at ultra-resolutions (e.g., 4K) with limited resources, which heavily hinders their further applications. In this paper, we tackle this dilemma by learning a straightforward and lightweight model, dubbed MicroAST. The key insight is to completely abandon the use of cumbersome pre-trained Deep Convolutional Neural Networks (e.g., VGG) at inference. Instead, we design two micro encoders (content and style encoders) and one micro decoder for style transfer. The content encoder aims at extracting the main structure of the content image. The style encoder, coupled with a modulator, encodes the style image into learnable dual-modulation signals that modulate both intermediate features and convolutional filters of the decoder, thus injecting more sophisticated and flexible style signals to guide the stylizations. In addition, to boost the ability of the style encoder to extract more distinct and representative style signals, we also introduce a new style signal contrastive loss in our model. Compared to the state of the art, our MicroAST not only produces visually superior results but also is 5-73 times smaller and 6-18 times faster, for the first time enabling super-fast (about 0.5 seconds) AST at 4K ultra-resolutions. Code is available at https://github.com/EndyWon/MicroAST.
translated by 谷歌翻译
最近的研究表明,通用风格转移的成功取得了巨大的成功,将任意视觉样式转移到内容图像中。但是,现有的方法遭受了审美的非现实主义问题,该问题引入了不和谐的模式和明显的人工制品,从而使结果很容易从真实的绘画中发现。为了解决这一限制,我们提出了一种新颖的美学增强风格转移方法,可以在美学上为任意风格产生更现实和令人愉悦的结果。具体而言,我们的方法引入了一种审美歧视者,以从大量的艺术家创造的绘画中学习通用的人类自愿美学特征。然后,合并了美学特征,以通过新颖的美学感知样式(AESSA)模块来增强样式转移过程。这样的AESSA模块使我们的Aesust能够根据样式图像的全局美学通道分布和内容图像的局部语义空间分布有效而灵活地集成样式模式。此外,我们还开发了一种新的两阶段转移培训策略,并通过两种审美正规化来更有效地训练我们的模型,从而进一步改善风格化的性能。广泛的实验和用户研究表明,我们的方法比艺术的状态综合了美学上更加和谐和现实的结果,从而大大缩小了真正的艺术家创造的绘画的差异。我们的代码可在https://github.com/endywon/aesust上找到。
translated by 谷歌翻译
在本文中,我们介绍了纹理改革器,一个快速和通用的神经基础框架,用于使用用户指定的指导进行交互式纹理传输。挑战在三个方面:1)任务的多样性,2)引导图的简单性,以及3)执行效率。为了解决这些挑战,我们的主要思想是使用由i)全球视图结构对准阶段,ii)局部视图纹理细化阶段和III)的新的前馈多视图和多级合成程序。效果增强阶段用相干结构合成高质量结果,并以粗略的方式进行细纹细节。此外,我们还介绍了一种新颖的无学习视图特定的纹理改革(VSTR)操作,具有新的语义地图指导策略,以实现更准确的语义引导和结构保存的纹理传输。关于各种应用场景的实验结果展示了我们框架的有效性和优越性。并与最先进的交互式纹理转移算法相比,它不仅可以实现更高的质量结果,而且更加显着,也是更快的2-5个数量级。代码可在https://github.com/endywon/texture --reformer中找到。
translated by 谷歌翻译
运动估计是用于评估目标器官解剖学和功能的动态医学图像处理的基本步骤。然而,通过评估局部图像相似性通过评估局部图像相似性优化运动场的基于图像的运动估计方法,易于产生令人难以置信的估计,尤其是在大运动的情况下。在这项研究中,我们提供了一种新颖的稀疏密度(DSD)的运动估计框架,其包括两个阶段。在第一阶段,我们处理原始密集图像以提取稀疏地标以表示目标器官解剖拓扑,并丢弃对运动估计不必要的冗余信息。为此目的,我们介绍一个无监督的3D地标检测网络,以提取用于目标器官运动估计的空间稀疏但代表性的地标。在第二阶段,我们从两个不同时间点的两个图像的提取稀疏地标的稀疏运动位移得出。然后,我们通过将稀疏地标位移突出回致密图像域,呈现运动重建网络来构造运动场。此外,我们从我们的两级DSD框架中使用估计的运动场作为初始化,并提高轻量级且有效的迭代优化中的运动估计质量。我们分别评估了两种动态医学成像任务的方法,分别为模型心脏运动和肺呼吸运动。与现有的比较方法相比,我们的方法产生了出色的运动估计精度。此外,广泛的实验结果表明,我们的解决方案可以提取良好代表性解剖标志,而无需手动注释。我们的代码在线公开提供。
translated by 谷歌翻译
生物制药制造业是一个快速增长的行业,几乎对所有药品分支机构产生了影响。在具有许多相互依赖因素的复杂生物过程动力学的情况下,生物制造过程需要密切监测和控制,并且由于实验的高成本以及个性化的生物毒品的新颖性,因此数据非常有限。我们开发了一种新型的基于模型的增强学习框架,该框架可以在低数据表环境中实现人级控制。该模型使用动态的贝叶斯网络来捕获因素之间的因果关系,并预测不同输入的影响如何通过生物处理机制的途径传播。这使得在模型风险上既可以解释又有坚固的过程控制策略的设计。我们提出了一种计算有效的,可证明的收敛随机梯度方法,用于优化此类策略。验证是在具有多维,连续状态变量的现实应用程序上进行的。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
Adversarial robustness assessment for video recognition models has raised concerns owing to their wide applications on safety-critical tasks. Compared with images, videos have much high dimension, which brings huge computational costs when generating adversarial videos. This is especially serious for the query-based black-box attacks where gradient estimation for the threat models is usually utilized, and high dimensions will lead to a large number of queries. To mitigate this issue, we propose to simultaneously eliminate the temporal and spatial redundancy within the video to achieve an effective and efficient gradient estimation on the reduced searching space, and thus query number could decrease. To implement this idea, we design the novel Adversarial spatial-temporal Focus (AstFocus) attack on videos, which performs attacks on the simultaneously focused key frames and key regions from the inter-frames and intra-frames in the video. AstFocus attack is based on the cooperative Multi-Agent Reinforcement Learning (MARL) framework. One agent is responsible for selecting key frames, and another agent is responsible for selecting key regions. These two agents are jointly trained by the common rewards received from the black-box threat models to perform a cooperative prediction. By continuously querying, the reduced searching space composed of key frames and key regions is becoming precise, and the whole query number becomes less than that on the original video. Extensive experiments on four mainstream video recognition models and three widely used action recognition datasets demonstrate that the proposed AstFocus attack outperforms the SOTA methods, which is prevenient in fooling rate, query number, time, and perturbation magnitude at the same.
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译